Search CORE

100 research outputs found

A Preferential Attachment Model for the Stellar Initial Mass Function

Author: Cisewski-Kehe Jessi
Schafer Chad
Weller Grant
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2019
Field of study

Accurate specification of a likelihood function is becoming increasingly difficult in many inference problems in astronomy. As sample sizes resulting from astronomical surveys continue to grow, deficiencies in the likelihood function lead to larger biases in key parameter estimates. These deficiencies result from the oversimplification of the physical processes that generated the data, and from the failure to account for observational limitations. Unfortunately, realistic models often do not yield an analytical form for the likelihood. The estimation of a stellar initial mass function (IMF) is an important example. The stellar IMF is the mass distribution of stars initially formed in a given cluster of stars, a population which is not directly observable due to stellar evolution and other disruptions and observational limitations of the cluster. There are several difficulties with specifying a likelihood in this setting since the physical processes and observational challenges result in measurable masses that cannot legitimately be considered independent draws from an IMF. This work improves inference of the IMF by using an approximate Bayesian computation approach that both accounts for observational and astrophysical effects and incorporates a physically-motivated model for star cluster formation. The methodology is illustrated via a simulation study, demonstrating that the proposed approach can recover the true posterior in realistic situations, and applied to observations from astrophysical simulation data

arXiv.org e-Print Archive

Crossref

High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation

Author: Izbicki Rafael
Lee Ann B.
Schafer Chad M.
Publication venue
Publication date: 29/04/2014
Field of study

The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia

arXiv.org e-Print Archive

CiteSeerX

Prototype selection for parameter estimation in complex models

Author: Freeman Peter E.
Lee Ann B.
Richards Joseph W.
Schafer Chad M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/03/2012
Field of study

Parameter estimation in astrophysics often requires the use of complex physical models. In this paper we study the problem of estimating the parameters that describe star formation history (SFH) in galaxies. Here, high-dimensional spectral data from galaxies are appropriately modeled as linear combinations of physical components, called simple stellar populations (SSPs), plus some nonlinear distortions. Theoretical data for each SSP is produced for a fixed parameter vector via computer modeling. Though the parameters that define each SSP are continuous, optimizing the signal model over a large set of SSPs on a fine parameter grid is computationally infeasible and inefficient. The goal of this study is to estimate the set of parameters that describes the SFH of each galaxy. These target parameters, such as the average ages and chemical compositions of the galaxy's stellar populations, are derived from the SSP parameters and the component weights in the signal model. Here, we introduce a principled approach of choosing a small basis of SSP prototypes for SFH parameter estimation. The basic idea is to quantize the vector space and effective support of the model components. In addition to greater computational efficiency, we achieve better estimates of the SFH target parameters. In simulations, our proposed quantization method obtains a substantial improvement in estimating the target parameters over the common method of employing a parameter grid. Sparse coding techniques are not appropriate for this problem without proper constraints, while constrained sparse coding methods perform poorly for parameter estimation because their objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Reinterpreting Fundamental Plane Correlations with Machine Learning

Author: Jagvaral Yesukhei
Schafer Chad
Singh Sukhdeep
Publication venue
Publication date: 27/10/2023
Field of study

This work explores the relationships between galaxy sizes and related observable galaxy properties in a large volume cosmological hydrodynamical simulation. The objectives of this work are to both develop a better understanding of the correlations between galaxy properties and the influence of environment on galaxy physics in order to build an improved model for the galaxy sizes, building off of the {\it fundamental plane}. With an accurate intrinsic galaxy size predictor, the residuals in the observed galaxy sizes can potentially be used for multiple cosmological applications, including making measurements of galaxy velocities in spectroscopic samples, estimating the rate of cosmic expansion, and constraining the uncertainties in the photometric redshifts of galaxies. Using projection pursuit regression, the model accurately predicts intrinsic galaxy sizes and have residuals which have limited correlation with galaxy properties. The model decreases the spatial correlation of galaxy size residuals by a factor of

\sim

5 at small scales compared to the baseline correlation when the mean size is used as a predictor.Comment: 16 pages, 12 figures, MNRA

arXiv.org e-Print Archive

A Statistical Method for Estimating Luminosity Functions using Truncated Data

Author: Chad M. Schafer
Choloniewski J.
Efstathiou G.
Hall P.
Jackson J.
Lynden-Bell D.
Nicoll J.
Rudemo M.
Stone C.
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2007
Field of study

The observational limitations of astronomical surveys lead to significant statistical inference challenges. One such challenge is the estimation of luminosity functions given redshift

z

and absolute magnitude

M

measurements from an irregularly truncated sample of objects. This is a bivariate density estimation problem; we develop here a statistically rigorous method which (1) does not assume a strict parametric form for the bivariate density; (2) does not assume independence between redshift and absolute magnitude (and hence allows evolution of the luminosity function with redshift); (3) does not require dividing the data into arbitrary bins; and (4) naturally incorporates a varying selection function. We accomplish this by decomposing the bivariate density into nonparametric and parametric portions. There is a simple way of estimating the integrated mean squared error of the estimator; smoothing parameters are selected to minimize this quantity. Results are presented from the analysis of a sample of quasars.Comment: 30 pages, 9 figures, Accepted for publication in Ap

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

How to Optimally Constrain Galaxy Assembly Bias: Supplement Projected Correlation Functions with Count-in-cells Statistics

Author: Bosch Frank C. van den
Campbell Duncan
Hearin Andrew P.
Lange Johannes U.
Mao Yao-Yuan
Schafer Chad M.
Villarreal Antonio S.
Wang Kuan
Zentner Andrew R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/09/2019
Field of study

Most models for the connection between galaxies and their haloes ignore the possibility that galaxy properties may be correlated with halo properties other than mass, a phenomenon known as galaxy assembly bias. Yet, it is known that such correlations can lead to systematic errors in the interpretation of survey data. At present, the degree to which galaxy assembly bias may be present in the real Universe, and the best strategies for constraining it remain uncertain. We study the ability of several observables to constrain galaxy assembly bias from redshift survey data using the decorated halo occupation distribution (dHOD), an empirical model of the galaxy--halo connection that incorporates assembly bias. We cover an expansive set of observables, including the projected two-point correlation function

w_{\mathrm{p}}(r_{\mathrm{p}})

, the galaxy--galaxy lensing signal

\Delta \Sigma(r_{\mathrm{p}})

, the void probability function

\mathrm{VPF}(r)

, the distributions of counts-in-cylinders

P(N_{\mathrm{CIC}})

, and counts-in-annuli

P(N_{\mathrm{CIA}})

, and the distribution of the ratio of counts in cylinders of different sizes

P(N_2/N_5)

. We find that despite the frequent use of the combination

w_{\mathrm{p}}(r_{\mathrm{p}})+\Delta \Sigma(r_{\mathrm{p}})

in interpreting galaxy data, the count statistics,

P(N_{\mathrm{CIC}})

and

P(N_{\mathrm{CIA}})

, are generally more efficient in constraining galaxy assembly bias when combined with

w_{\mathrm{p}}(r_{\mathrm{p}})

. Constraints based upon

w_{\mathrm{p}}(r_{\mathrm{p}})

and

\Delta \Sigma(r_{\mathrm{p}})

share common degeneracy directions in the parameter space, while combinations of

w_{\mathrm{p}}(r_{\mathrm{p}})

with the count statistics are more complementary. Therefore, we strongly suggest that count statistics should be used to complement the canonical observables in future studies of the galaxy--halo connection.Comment: Figures 3 and 4 show the main results. Published in Monthly Notices of the Royal Astronomical Societ

arXiv.org e-Print Archive

Semi-supervised Learning for Photometric Supernova Classification

Author: Breiman
Breiman
Breiman
Breiman
Chad M. Schafer
Chapelle
Coifman
Darren Homrighausen
Dovi Poznanski
Dubath
Falck
Freeman
Gong
Grimmett
Hastie
Homeier
Johnson
Joseph W. Richards
Kessler
Kessler
Kunz
Kuznetsova
Lafon
Newling
Nugent
Peter E. Freeman
Poznanski
Poznanski
Poznanski
Richards
Richards
Richards
Rodney
Ruppert
Sullivan
Wasserman
Publication venue: 'Wiley'
Publication date: 27/09/2011
Field of study

We present a semi-supervised method for photometric supernova typing. Our approach is to first use the nonlinear dimension reduction technique diffusion map to detect structure in a database of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template based methods. Applied to supernova data simulated by Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 95% Type Ia purity and 87% Type Ia efficiency on the spectroscopic sample, but only 50% Type Ia purity and 50% efficiency on the photometric sample due to their spectroscopic follow-up strategy. To improve the performance on the photometric sample, we search for better spectroscopic follow-up procedures by studying the sensitivity of our machine learned supernova classification on the specific strategy used to obtain training sets. With a fixed amount of spectroscopic follow-up time, we find that deeper magnitude-limited spectroscopic surveys are better for producing training sets. For supernova Ia (II-P) typing, we obtain a 44% (1%) increase in purity to 72% (87%) and 30% (162%) increase in efficiency to 65% (84%) of the sample using a 25th (24.5th) magnitude-limited survey instead of the shallower spectroscopic sample used in the original simulations. When redshift information is available, we incorporate it into our analysis using a novel method of altering the diffusion map representation of the supernovae. Incorporating host redshifts leads to a 5% improvement in Type Ia purity and 13% improvement in Type Ia efficiency.Comment: 16 pages, 11 figures, accepted for publication in MNRA

arXiv.org e-Print Archive

Crossref